Goto

Collaborating Authors

 local topic


Multilingual Detection of Check-Worthy Claims using World Languages and Adapter Fusion

arXiv.org Artificial Intelligence

Check-worthiness detection is the task of identifying claims, worthy to be investigated by fact-checkers. Resource scarcity for non-world languages and model learning costs remain major challenges for the creation of models supporting multilingual check-worthiness detection. This paper proposes cross-training adapters on a subset of world languages, combined by adapter fusion, to detect claims emerging globally in multiple languages. (1) With a vast number of annotators available for world languages and the storage-efficient adapter models, this approach is more cost efficient. Models can be updated more frequently and thus stay up-to-date. (2) Adapter fusion provides insights and allows for interpretation regarding the influence of each adapter model on a particular language. The proposed solution often outperformed the top multilingual approaches in our benchmark tasks.


Scalable Dynamic Topic Modeling with Clustered Latent Dirichlet Allocation (CLDA)

arXiv.org Machine Learning

Topic modeling, a method for extracting the underlying themes from a collection of documents, is an increasingly important component of the design of intelligent systems enabling the sense-making of highly dynamic and diverse streams of text data. Traditional methods such as Dynamic Topic Modeling (DTM) do not lend themselves well to direct parallelization because of dependencies from one time step to another. In this paper, we introduce and empirically analyze Clustered Latent Dirichlet Allocation (CLDA), a method for extracting dynamic latent topics from a collection of documents. Our approach is based on data decomposition in which the data is partitioned into segments, followed by topic modeling on the individual segments. The resulting local models are then combined into a global solution using clustering. The decomposition and resulting parallelization leads to very fast runtime even on very large datasets. Our approach furthermore provides insight into how the composition of topics changes over time and can also be applied using other data partitioning strategies over any discrete features of the data, such as geographic features or classes of users. In this paper CLDA is applied successfully to seventeen years of NIPS conference papers (2,484 documents and 3,280,697 words), seventeen years of computer science journal abstracts (533,560 documents and 32,551,540 words), and to forty years of the PubMed corpus (4,025,978 documents and 273,853,980 words).


Modeling Leadership Behavior of Players in Virtual Worlds

AAAI Conferences

In this article, we describe our method of modeling sociolinguistic behaviors of players in massively multi-player online games. The focus of this paper is leadership, as it is manifested by the participants engaged in discussion, and the automated modeling of this complex behavior in virtual worlds. We first approach the research question of modeling from a social science perspective, and ground our models in theories from human communication literature. We then adapt a two-tiered algorithmic model that derives certain mid-level sociolinguistic behaviors--such as Task Control, Topic Control and Disagreement from discourse linguistic indicators--and combines these in a weighted model to reveal the complex role of Leadership. The algorithm is evaluated by comparing its prediction of leaders against ground truth โ€“ the participantsโ€™ own ratings of leadership of themselves and their conversation peers. We find the algorithm performance to be considerably better than baseline.


Integrating Document Clustering and Topic Modeling

arXiv.org Machine Learning

Document clustering and topic modeling are two closely related tasks which can mutually benefit each other. Topic modeling can project documents into a topic space which facilitates effective document clustering. Cluster labels discovered by document clustering can be incorporated into topic models to extract local topics specific to each cluster and global topics shared by all clusters. In this paper, we propose a multi-grain clustering topic model (MGCTM) which integrates document clustering and topic modeling into a unified framework and jointly performs the two tasks to achieve the overall best performance. Our model tightly couples two components: a mixture component used for discovering latent groups in document collection and a topic model component used for mining multi-grain topics including local topics specific to each cluster and global topics shared across clusters. We employ variational inference to approximate the posterior of hidden variables and learn model parameters. Experiments on two datasets demonstrate the effectiveness of our model.


Modeling Socio-Cultural Phenomena in Online Multi-Party Discourse

AAAI Conferences

We present in this paper, the application of a novel approach to computational modeling, understanding and detection of social phenomena in online multi-party discourse. A two-tiered approach was developed to detect a collection of social phenomena deployed by participants, such as topic control, task control, disagreement and involvement. We discuss how the mid-level social phenomena can be reliably detected in discourse and these measures can be used to differentiate participants of online discourse. Our approach works across different types of online chat and we show results on two specific data sets.